Redoop: Supporting Recurring Queries in Hadoop

نویسندگان

Chuan Lei

Elke A. Rundensteiner

Mohamed Y. Eltabakh

چکیده

The growing demand for large-scale data analytics ranging from online advertisement placement, log processing, to fraud detection, has led to the design of highly scalable data-intensive computing infrastructures such as the Hadoop platform. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Despite their importance, the plain Hadoop along with its state-of-art extensions lack built-in support for recurring queries. In particular, they lack efficient and scalable analytics over evolving datasets. In this work, we present the Redoop system, an extension of the Hadoop framework, designed to fill in this void. Redoop supports recurring queries as firstclass citizen in Hadoop without sacrificing any of its core features. More importantly, Redoop deploys innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support. Our extensive experimental study with real datasets demonstrates that Redoop achieves significant runtime performance gains of up to 9x speedup compared to the plain Hadoop.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Redoop Infrastructure for Recurring Big Data Queries

This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the supp...

متن کامل

Shared Execution of Recurring Workloads in MapReduce

With the increasing complexity of data-intensive MapReduce workloads, Hadoop must often accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated datasets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commo...

متن کامل

Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous ...

متن کامل

ST-Hadoop: A MapReduce Framework for Spatio-Temporal Data

This paper presents ST-Hadoop; the first full-fledged opensource MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data t...

متن کامل

Hadoop-GIS: A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce

Querying and analyzing large volumes of spatially oriented scientific data becomes increasingly important for many applications. For example, analyzing high-resolution digital pathology images through computer algorithms provides rich spatially derived information of micro-anatomic objects of human tissues. The spatial oriented information and queries at both cellular and sub-cellular scales sh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Redoop: Supporting Recurring Queries in Hadoop

نویسندگان

چکیده

منابع مشابه

Redoop Infrastructure for Recurring Big Data Queries

Shared Execution of Recurring Workloads in MapReduce

Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

ST-Hadoop: A MapReduce Framework for Spatio-Temporal Data

Hadoop-GIS: A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce

عنوان ژورنال:

اشتراک گذاری